Overview

Brought to you by YData

Dataset statistics

Number of variables12
Number of observations2226382
Missing cells2640112
Missing cells (%)9.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory203.8 MiB
Average record size in memory96.0 B

Variable types

Numeric8
Categorical1
Text3

Alerts

bath is highly overall correlated with bed and 2 other fieldsHigh correlation
bed is highly overall correlated with bath and 1 other fieldsHigh correlation
house_size is highly overall correlated with bath and 2 other fieldsHigh correlation
price is highly overall correlated with bath and 1 other fieldsHigh correlation
bed has 481317 (21.6%) missing valuesMissing
bath has 511771 (23.0%) missing valuesMissing
acre_lot has 325589 (14.6%) missing valuesMissing
house_size has 568484 (25.5%) missing valuesMissing
prev_sold_date has 734297 (33.0%) missing valuesMissing
price is highly skewed (γ1 = 546.3030625)Skewed
bed is highly skewed (γ1 = 56.65481293)Skewed
bath is highly skewed (γ1 = 152.4149966)Skewed
acre_lot is highly skewed (γ1 = 106.2802845)Skewed
house_size is highly skewed (γ1 = 1286.9001)Skewed

Reproduction

Analysis started2024-09-02 21:18:25.087308
Analysis finished2024-09-02 21:19:24.387923
Duration59.3 seconds
Software versionydata-profiling vv4.9.0
Download configurationconfig.json

Variables

brokered_by
Real number (ℝ)

Distinct110143
Distinct (%)5.0%
Missing4533
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean52939.893
Minimum0
Maximum110142
Zeros12
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2024-09-02T17:19:24.480785image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile8485
Q123861
median52884
Q379183
95-th percentile105405
Maximum110142
Range110142
Interquartile range (IQR)55322

Descriptive statistics

Standard deviation30642.753
Coefficient of variation (CV)0.57882158
Kurtosis-1.1244379
Mean52939.893
Median Absolute Deviation (MAD)27089
Skewness0.14739295
Sum1.1762445 × 1011
Variance9.389783 × 108
MonotonicityNot monotonic
2024-09-02T17:19:24.561923image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22611 45658
 
2.1%
16829 27732
 
1.2%
53016 21709
 
1.0%
23592 9176
 
0.4%
30807 8464
 
0.4%
33714 6928
 
0.3%
57595 6410
 
0.3%
84534 5502
 
0.2%
109978 5365
 
0.2%
109914 5231
 
0.2%
Other values (110133) 2079674
93.4%
ValueCountFrequency (%)
0 12
 
< 0.1%
1 5
 
< 0.1%
2 9
 
< 0.1%
3 2
 
< 0.1%
4 6
 
< 0.1%
5 4
 
< 0.1%
6 5
 
< 0.1%
7 2
 
< 0.1%
8 277
< 0.1%
9 4
 
< 0.1%
ValueCountFrequency (%)
110142 8
 
< 0.1%
110141 6
 
< 0.1%
110140 3
 
< 0.1%
110139 1
 
< 0.1%
110138 62
< 0.1%
110137 1
 
< 0.1%
110136 1
 
< 0.1%
110135 23
 
< 0.1%
110134 1
 
< 0.1%
110133 15
 
< 0.1%

status
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size17.0 MiB
for_sale
1389306 
sold
812009 
ready_to_build
 
25067

Length

Max length14
Median length8
Mean length6.6086691
Min length4

Characters and Unicode

Total characters14713422
Distinct characters14
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfor_sale
2nd rowfor_sale
3rd rowfor_sale
4th rowfor_sale
5th rowfor_sale

Common Values

ValueCountFrequency (%)
for_sale 1389306
62.4%
sold 812009
36.5%
ready_to_build 25067
 
1.1%

Length

2024-09-02T17:19:24.639968image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-09-02T17:19:24.749620image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
for_sale 1389306
62.4%
sold 812009
36.5%
ready_to_build 25067
 
1.1%

Most occurring characters

ValueCountFrequency (%)
o 2226382
15.1%
l 2226382
15.1%
s 2201315
15.0%
_ 1439440
9.8%
r 1414373
9.6%
a 1414373
9.6%
e 1414373
9.6%
f 1389306
9.4%
d 862143
 
5.9%
y 25067
 
0.2%
Other values (4) 100268
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 13273982
90.2%
Connector Punctuation 1439440
 
9.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 2226382
16.8%
l 2226382
16.8%
s 2201315
16.6%
r 1414373
10.7%
a 1414373
10.7%
e 1414373
10.7%
f 1389306
10.5%
d 862143
 
6.5%
y 25067
 
0.2%
t 25067
 
0.2%
Other values (3) 75201
 
0.6%
Connector Punctuation
ValueCountFrequency (%)
_ 1439440
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 13273982
90.2%
Common 1439440
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 2226382
16.8%
l 2226382
16.8%
s 2201315
16.6%
r 1414373
10.7%
a 1414373
10.7%
e 1414373
10.7%
f 1389306
10.5%
d 862143
 
6.5%
y 25067
 
0.2%
t 25067
 
0.2%
Other values (3) 75201
 
0.6%
Common
ValueCountFrequency (%)
_ 1439440
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14713422
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 2226382
15.1%
l 2226382
15.1%
s 2201315
15.0%
_ 1439440
9.8%
r 1414373
9.6%
a 1414373
9.6%
e 1414373
9.6%
f 1389306
9.4%
d 862143
 
5.9%
y 25067
 
0.2%
Other values (4) 100268
 
0.7%

price
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct102137
Distinct (%)4.6%
Missing1541
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean524195.52
Minimum0
Maximum2.1474836 × 109
Zeros280
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2024-09-02T17:19:24.830264image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30000
Q1165000
median325000
Q3550000
95-th percentile1495000
Maximum2.1474836 × 109
Range2.1474836 × 109
Interquartile range (IQR)385000

Descriptive statistics

Standard deviation2138893.2
Coefficient of variation (CV)4.0803348
Kurtosis492423.19
Mean524195.52
Median Absolute Deviation (MAD)180000
Skewness546.30306
Sum1.1662517 × 1012
Variance4.5748642 × 1012
MonotonicityNot monotonic
2024-09-02T17:19:24.894198image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
350000 15430
 
0.7%
250000 15254
 
0.7%
325000 14040
 
0.6%
225000 13939
 
0.6%
450000 13213
 
0.6%
275000 13084
 
0.6%
425000 12769
 
0.6%
375000 12055
 
0.5%
299900 11791
 
0.5%
150000 11315
 
0.5%
Other values (102127) 2091951
94.0%
ValueCountFrequency (%)
0 280
< 0.1%
1 508
< 0.1%
2 14
 
< 0.1%
3 3
 
< 0.1%
4 2
 
< 0.1%
5 5
 
< 0.1%
6 6
 
< 0.1%
7 2
 
< 0.1%
8 14
 
< 0.1%
9 2
 
< 0.1%
ValueCountFrequency (%)
2147483600 1
< 0.1%
1000000000 1
< 0.1%
875000000 1
< 0.1%
515000000 1
< 0.1%
295000000 1
< 0.1%
281500000 1
< 0.1%
250000000 1
< 0.1%
212500000 1
< 0.1%
169000000 1
< 0.1%
165000000 1
< 0.1%

bed
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED 

Distinct99
Distinct (%)< 0.1%
Missing481317
Missing (%)21.6%
Infinite0
Infinite (%)0.0%
Mean3.2758407
Minimum1
Maximum473
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2024-09-02T17:19:24.971854image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median3
Q34
95-th percentile5
Maximum473
Range472
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.5672739
Coefficient of variation (CV)0.47843408
Kurtosis12971.516
Mean3.2758407
Median Absolute Deviation (MAD)1
Skewness56.654813
Sum5716555
Variance2.4563473
MonotonicityNot monotonic
2024-09-02T17:19:25.051041image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3 753923
33.9%
4 440566
19.8%
2 311019
14.0%
5 120637
 
5.4%
1 65098
 
2.9%
6 32209
 
1.4%
7 8001
 
0.4%
8 6103
 
0.3%
9 2402
 
0.1%
10 1378
 
0.1%
Other values (89) 3729
 
0.2%
(Missing) 481317
21.6%
ValueCountFrequency (%)
1 65098
 
2.9%
2 311019
14.0%
3 753923
33.9%
4 440566
19.8%
5 120637
 
5.4%
6 32209
 
1.4%
7 8001
 
0.4%
8 6103
 
0.3%
9 2402
 
0.1%
10 1378
 
0.1%
ValueCountFrequency (%)
473 1
< 0.1%
444 2
< 0.1%
222 1
< 0.1%
212 1
< 0.1%
210 1
< 0.1%
190 1
< 0.1%
148 1
< 0.1%
142 1
< 0.1%
136 1
< 0.1%
123 1
< 0.1%

bath
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED 

Distinct86
Distinct (%)< 0.1%
Missing511771
Missing (%)23.0%
Infinite0
Infinite (%)0.0%
Mean2.4964403
Minimum1
Maximum830
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2024-09-02T17:19:25.129327image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum830
Range829
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.6525725
Coefficient of variation (CV)0.66197158
Kurtosis65874.151
Mean2.4964403
Median Absolute Deviation (MAD)1
Skewness152.415
Sum4280424
Variance2.730996
MonotonicityNot monotonic
2024-09-02T17:19:25.208420image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2 746294
33.5%
3 471821
21.2%
1 260131
 
11.7%
4 157290
 
7.1%
5 45563
 
2.0%
6 17080
 
0.8%
7 7114
 
0.3%
8 4078
 
0.2%
9 1902
 
0.1%
10 1038
 
< 0.1%
Other values (76) 2300
 
0.1%
(Missing) 511771
23.0%
ValueCountFrequency (%)
1 260131
 
11.7%
2 746294
33.5%
3 471821
21.2%
4 157290
 
7.1%
5 45563
 
2.0%
6 17080
 
0.8%
7 7114
 
0.3%
8 4078
 
0.2%
9 1902
 
0.1%
10 1038
 
< 0.1%
ValueCountFrequency (%)
830 1
< 0.1%
752 1
< 0.1%
460 1
< 0.1%
222 2
< 0.1%
212 2
< 0.1%
198 1
< 0.1%
175 1
< 0.1%
163 1
< 0.1%
157 1
< 0.1%
123 1
< 0.1%

acre_lot
Real number (ℝ)

MISSING  SKEWED 

Distinct16057
Distinct (%)0.8%
Missing325589
Missing (%)14.6%
Infinite0
Infinite (%)0.0%
Mean15.223027
Minimum0
Maximum100000
Zeros2226
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2024-09-02T17:19:25.286969image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.05
Q10.15
median0.26
Q30.98
95-th percentile14.02
Maximum100000
Range100000
Interquartile range (IQR)0.83

Descriptive statistics

Standard deviation762.8238
Coefficient of variation (CV)50.109862
Kurtosis12542.323
Mean15.223027
Median Absolute Deviation (MAD)0.16
Skewness106.28028
Sum28935824
Variance581900.15
MonotonicityNot monotonic
2024-09-02T17:19:25.365948image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.17 66180
 
3.0%
0.14 65258
 
2.9%
0.16 55864
 
2.5%
0.23 55742
 
2.5%
0.15 52191
 
2.3%
0.18 49004
 
2.2%
0.11 46560
 
2.1%
0.19 44391
 
2.0%
0.13 42480
 
1.9%
0.2 41376
 
1.9%
Other values (16047) 1381747
62.1%
(Missing) 325589
 
14.6%
ValueCountFrequency (%)
0 2226
 
0.1%
0.01 8886
 
0.4%
0.02 20582
0.9%
0.03 24737
1.1%
0.04 24488
1.1%
0.05 23190
1.0%
0.06 23920
1.1%
0.07 24531
1.1%
0.08 21848
1.0%
0.09 27516
1.2%
ValueCountFrequency (%)
100000 52
< 0.1%
99999 7
 
< 0.1%
98135 1
 
< 0.1%
96120 1
 
< 0.1%
95832 1
 
< 0.1%
94457 1
 
< 0.1%
93248 1
 
< 0.1%
91178 1
 
< 0.1%
91040 1
 
< 0.1%
90522 1
 
< 0.1%

street
Real number (ℝ)

Distinct2001358
Distinct (%)90.3%
Missing10866
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean1012324.9
Minimum0
Maximum2001357
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2024-09-02T17:19:25.460285image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile101044.75
Q1506312.75
median1012765.5
Q31521173.2
95-th percentile1913044.2
Maximum2001357
Range2001357
Interquartile range (IQR)1014860.5

Descriptive statistics

Standard deviation583763.48
Coefficient of variation (CV)0.57665624
Kurtosis-1.2130069
Mean1012324.9
Median Absolute Deviation (MAD)507433
Skewness-0.0092302458
Sum2.2428221 × 1012
Variance3.407798 × 1011
MonotonicityNot monotonic
2024-09-02T17:19:25.539662image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1916862 158
 
< 0.1%
1861860 142
 
< 0.1%
1951128 127
 
< 0.1%
1801524 98
 
< 0.1%
793078 87
 
< 0.1%
824804 81
 
< 0.1%
222498 77
 
< 0.1%
6365 76
 
< 0.1%
274432 76
 
< 0.1%
1862077 75
 
< 0.1%
Other values (2001348) 2214519
99.5%
(Missing) 10866
 
0.5%
ValueCountFrequency (%)
0 2
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 2
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
ValueCountFrequency (%)
2001357 1
 
< 0.1%
2001356 1
 
< 0.1%
2001355 1
 
< 0.1%
2001354 1
 
< 0.1%
2001353 1
 
< 0.1%
2001352 1
 
< 0.1%
2001351 1
 
< 0.1%
2001350 1
 
< 0.1%
2001349 1
 
< 0.1%
2001348 3
< 0.1%

city
Text

Distinct20098
Distinct (%)0.9%
Missing1407
Missing (%)0.1%
Memory size17.0 MiB
2024-09-02T17:19:25.882242image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Length

Max length49
Median length44
Mean length9.0651117
Min length1

Characters and Unicode

Total characters20169647
Distinct characters69
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2947 ?
Unique (%)0.1%

Sample

1st rowAdjuntas
2nd rowAdjuntas
3rd rowJuana Diaz
4th rowPonce
5th rowMayaguez
ValueCountFrequency (%)
city 70521
 
2.4%
beach 41006
 
1.4%
new 35840
 
1.2%
san 35513
 
1.2%
saint 29933
 
1.0%
lake 28214
 
1.0%
houston 23930
 
0.8%
springs 21440
 
0.7%
york 20171
 
0.7%
fort 19712
 
0.7%
Other values (14952) 2617043
88.9%
2024-09-02T17:19:26.167511image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 1862865
 
9.2%
a 1782114
 
8.8%
o 1505184
 
7.5%
n 1470469
 
7.3%
l 1348266
 
6.7%
i 1238942
 
6.1%
r 1234849
 
6.1%
t 1049447
 
5.2%
s 873953
 
4.3%
718355
 
3.6%
Other values (59) 7085203
35.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16497973
81.8%
Uppercase Letter 2950086
 
14.6%
Space Separator 718355
 
3.6%
Other Punctuation 2327
 
< 0.1%
Decimal Number 890
 
< 0.1%
Dash Punctuation 16
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1862865
11.3%
a 1782114
10.8%
o 1505184
9.1%
n 1470469
8.9%
l 1348266
 
8.2%
i 1238942
 
7.5%
r 1234849
 
7.5%
t 1049447
 
6.4%
s 873953
 
5.3%
d 486422
 
2.9%
Other values (18) 3645462
22.1%
Uppercase Letter
ValueCountFrequency (%)
C 337267
 
11.4%
S 291749
 
9.9%
B 234005
 
7.9%
P 217612
 
7.4%
M 212579
 
7.2%
L 186356
 
6.3%
H 169810
 
5.8%
A 142442
 
4.8%
W 138386
 
4.7%
R 124476
 
4.2%
Other values (16) 895404
30.4%
Decimal Number
ValueCountFrequency (%)
1 157
17.6%
4 110
12.4%
2 97
10.9%
9 95
10.7%
3 95
10.7%
6 71
8.0%
5 70
7.9%
8 70
7.9%
0 64
7.2%
7 61
 
6.9%
Other Punctuation
ValueCountFrequency (%)
' 2003
86.1%
. 312
 
13.4%
, 12
 
0.5%
Space Separator
ValueCountFrequency (%)
718355
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 19448059
96.4%
Common 721588
 
3.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1862865
 
9.6%
a 1782114
 
9.2%
o 1505184
 
7.7%
n 1470469
 
7.6%
l 1348266
 
6.9%
i 1238942
 
6.4%
r 1234849
 
6.3%
t 1049447
 
5.4%
s 873953
 
4.5%
d 486422
 
2.5%
Other values (44) 6595548
33.9%
Common
ValueCountFrequency (%)
718355
99.6%
' 2003
 
0.3%
. 312
 
< 0.1%
1 157
 
< 0.1%
4 110
 
< 0.1%
2 97
 
< 0.1%
9 95
 
< 0.1%
3 95
 
< 0.1%
6 71
 
< 0.1%
5 70
 
< 0.1%
Other values (5) 223
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20169643
> 99.9%
None 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1862865
 
9.2%
a 1782114
 
8.8%
o 1505184
 
7.5%
n 1470469
 
7.3%
l 1348266
 
6.7%
i 1238942
 
6.1%
r 1234849
 
6.1%
t 1049447
 
5.2%
s 873953
 
4.3%
718355
 
3.6%
Other values (57) 7085199
35.1%
None
ValueCountFrequency (%)
ó 3
75.0%
í 1
 
25.0%

state
Text

Distinct55
Distinct (%)< 0.1%
Missing8
Missing (%)< 0.1%
Memory size17.0 MiB
2024-09-02T17:19:26.305459image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Length

Max length20
Median length13
Mean length8.3506293
Min length4

Characters and Unicode

Total characters18591624
Distinct characters47
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowPuerto Rico
2nd rowPuerto Rico
3rd rowPuerto Rico
4th rowPuerto Rico
5th rowPuerto Rico
ValueCountFrequency (%)
florida 249432
 
9.7%
california 227215
 
8.8%
texas 208335
 
8.1%
new 176075
 
6.8%
carolina 128112
 
5.0%
york 103159
 
4.0%
north 90013
 
3.5%
illinois 85280
 
3.3%
virginia 81072
 
3.1%
georgia 80977
 
3.1%
Other values (51) 1147586
44.5%
2024-09-02T17:19:26.521612image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 2446490
13.2%
i 2125529
 
11.4%
o 1656872
 
8.9%
n 1526221
 
8.2%
r 1291528
 
6.9%
s 1143277
 
6.1%
e 1052159
 
5.7%
l 1030116
 
5.5%
t 438216
 
2.4%
h 419464
 
2.3%
Other values (37) 5461752
29.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15670111
84.3%
Uppercase Letter 2570631
 
13.8%
Space Separator 350882
 
1.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2446490
15.6%
i 2125529
13.6%
o 1656872
10.6%
n 1526221
9.7%
r 1291528
8.2%
s 1143277
7.3%
e 1052159
6.7%
l 1030116
6.6%
t 438216
 
2.8%
h 419464
 
2.7%
Other values (14) 2540239
16.2%
Uppercase Letter
ValueCountFrequency (%)
C 408253
15.9%
N 287064
11.2%
M 267532
10.4%
F 249432
9.7%
T 249299
9.7%
I 152965
 
6.0%
A 132504
 
5.2%
O 128510
 
5.0%
W 121199
 
4.7%
Y 103159
 
4.0%
Other values (12) 470714
18.3%
Space Separator
ValueCountFrequency (%)
350882
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 18240742
98.1%
Common 350882
 
1.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2446490
13.4%
i 2125529
11.7%
o 1656872
 
9.1%
n 1526221
 
8.4%
r 1291528
 
7.1%
s 1143277
 
6.3%
e 1052159
 
5.8%
l 1030116
 
5.6%
t 438216
 
2.4%
h 419464
 
2.3%
Other values (36) 5110870
28.0%
Common
ValueCountFrequency (%)
350882
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18591624
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2446490
13.2%
i 2125529
 
11.4%
o 1656872
 
8.9%
n 1526221
 
8.2%
r 1291528
 
6.9%
s 1143277
 
6.1%
e 1052159
 
5.7%
l 1030116
 
5.5%
t 438216
 
2.4%
h 419464
 
2.3%
Other values (37) 5461752
29.4%

zip_code
Real number (ℝ)

Distinct30334
Distinct (%)1.4%
Missing299
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean52186.676
Minimum0
Maximum99999
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2024-09-02T17:19:26.599813image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile8512
Q129617
median48382
Q378070
95-th percentile95969
Maximum99999
Range99999
Interquartile range (IQR)48453

Descriptive statistics

Standard deviation28954.085
Coefficient of variation (CV)0.55481756
Kurtosis-1.3131808
Mean52186.676
Median Absolute Deviation (MAD)25950
Skewness0.092234425
Sum1.1617187 × 1011
Variance8.3833901 × 108
MonotonicityNot monotonic
2024-09-02T17:19:26.679131image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33993 2472
 
0.1%
33981 2282
 
0.1%
33974 1996
 
0.1%
33160 1718
 
0.1%
32909 1707
 
0.1%
33139 1589
 
0.1%
34288 1424
 
0.1%
73099 1413
 
0.1%
33953 1380
 
0.1%
32908 1377
 
0.1%
Other values (30324) 2208725
99.2%
ValueCountFrequency (%)
0 1
 
< 0.1%
601 2
 
< 0.1%
602 42
< 0.1%
603 48
< 0.1%
604 1
 
< 0.1%
605 2
 
< 0.1%
606 5
 
< 0.1%
610 9
 
< 0.1%
612 65
< 0.1%
613 2
 
< 0.1%
ValueCountFrequency (%)
99999 37
< 0.1%
99950 2
 
< 0.1%
99929 18
< 0.1%
99927 2
 
< 0.1%
99925 6
 
< 0.1%
99923 3
 
< 0.1%
99921 7
 
< 0.1%
99919 13
 
< 0.1%
99918 21
< 0.1%
99903 8
 
< 0.1%

house_size
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED 

Distinct12061
Distinct (%)0.7%
Missing568484
Missing (%)25.5%
Infinite0
Infinite (%)0.0%
Mean2714.4713
Minimum4
Maximum1.0404004 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2024-09-02T17:19:26.765578image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile847
Q11300
median1760
Q32413
95-th percentile4008
Maximum1.0404004 × 109
Range1.0404004 × 109
Interquartile range (IQR)1113

Descriptive statistics

Standard deviation808163.52
Coefficient of variation (CV)297.72409
Kurtosis1656699.7
Mean2714.4713
Median Absolute Deviation (MAD)528
Skewness1286.9001
Sum4.5003166 × 109
Variance6.5312827 × 1011
MonotonicityNot monotonic
2024-09-02T17:19:26.966020image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1200 8938
 
0.4%
1500 6316
 
0.3%
1800 6272
 
0.3%
1400 6108
 
0.3%
1600 5630
 
0.3%
1440 5590
 
0.3%
1000 5567
 
0.3%
960 5394
 
0.2%
1344 5383
 
0.2%
1100 5096
 
0.2%
Other values (12051) 1597604
71.8%
(Missing) 568484
 
25.5%
ValueCountFrequency (%)
4 1
 
< 0.1%
100 22
< 0.1%
101 2
 
< 0.1%
102 1
 
< 0.1%
104 1
 
< 0.1%
108 1
 
< 0.1%
110 1
 
< 0.1%
111 3
 
< 0.1%
112 1
 
< 0.1%
115 1
 
< 0.1%
ValueCountFrequency (%)
1040400400 1
< 0.1%
12992200 1
< 0.1%
9842382 1
< 0.1%
7971480 1
< 0.1%
3484800 1
< 0.1%
3434706 1
< 0.1%
1560780 1
< 0.1%
1454468 1
< 0.1%
1450112 1
< 0.1%
1306800 1
< 0.1%

prev_sold_date
Text

MISSING 

Distinct14954
Distinct (%)1.0%
Missing734297
Missing (%)33.0%
Memory size17.0 MiB
2024-09-02T17:19:27.136083image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters14920850
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2386 ?
Unique (%)0.2%

Sample

1st row2020-02-28
2nd row2019-06-28
3rd row2021-09-15
4th row2021-03-15
5th row2013-10-11
ValueCountFrequency (%)
2022-03-31 17171
 
1.2%
2022-04-15 16297
 
1.1%
2022-04-22 15762
 
1.1%
2022-04-08 15038
 
1.0%
2022-02-28 14144
 
0.9%
2022-04-29 13783
 
0.9%
2021-11-30 12856
 
0.9%
2022-03-25 12558
 
0.8%
2022-02-25 12278
 
0.8%
2021-11-19 12076
 
0.8%
Other values (14944) 1350122
90.5%
2024-09-02T17:19:27.388048image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 4000417
26.8%
0 3356517
22.5%
- 2984170
20.0%
1 2111038
14.1%
3 492966
 
3.3%
4 454577
 
3.0%
9 400090
 
2.7%
8 331671
 
2.2%
5 292909
 
2.0%
7 263022
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 11936680
80.0%
Dash Punctuation 2984170
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 4000417
33.5%
0 3356517
28.1%
1 2111038
17.7%
3 492966
 
4.1%
4 454577
 
3.8%
9 400090
 
3.4%
8 331671
 
2.8%
5 292909
 
2.5%
7 263022
 
2.2%
6 233473
 
2.0%
Dash Punctuation
ValueCountFrequency (%)
- 2984170
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 14920850
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 4000417
26.8%
0 3356517
22.5%
- 2984170
20.0%
1 2111038
14.1%
3 492966
 
3.3%
4 454577
 
3.0%
9 400090
 
2.7%
8 331671
 
2.2%
5 292909
 
2.0%
7 263022
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14920850
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 4000417
26.8%
0 3356517
22.5%
- 2984170
20.0%
1 2111038
14.1%
3 492966
 
3.3%
4 454577
 
3.0%
9 400090
 
2.7%
8 331671
 
2.2%
5 292909
 
2.0%
7 263022
 
1.8%

Interactions

2024-09-02T17:19:16.610363image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:01.058038image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:03.922410image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:06.146228image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:08.185452image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:10.231342image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:12.238139image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:14.484039image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:16.851056image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:01.443113image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:04.219006image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:06.428217image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:08.471653image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:10.495971image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:12.555670image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:14.794525image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:17.075850image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:01.780942image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:04.487385image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:06.686226image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:08.782129image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:10.710995image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:12.817593image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:15.039466image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:17.321781image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:02.034862image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:04.741050image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:06.938810image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:09.028803image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:10.922947image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:13.056827image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:15.281858image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:17.602316image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:02.383642image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:05.006438image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:07.160344image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:09.241319image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:11.174159image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:13.338291image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:15.543496image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:17.848708image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:02.733394image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:05.351272image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:07.439454image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:09.475231image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:11.440818image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:13.620669image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:15.836671image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:18.077137image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:03.258213image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:05.657913image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:07.691683image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:09.739369image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:11.718259image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:13.943956image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:16.119758image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:18.324126image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:03.623106image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:05.899369image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:07.939567image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:09.971272image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:11.941012image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:14.198744image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-09-02T17:19:16.366150image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Correlations

2024-09-02T17:19:27.449980image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
acre_lotbathbedbrokered_byhouse_sizepricestatusstreetzip_code
acre_lot1.0000.1080.149-0.0040.252-0.0700.0030.159-0.072
bath0.1081.0000.598-0.0010.7590.5420.001-0.0010.008
bed0.1490.5981.0000.0070.7150.3500.003-0.0000.004
brokered_by-0.004-0.0010.0071.000-0.0010.0010.0810.0000.065
house_size0.2520.7590.715-0.0011.0000.5350.000-0.0030.011
price-0.0700.5420.3500.0010.5351.0000.000-0.1260.128
status0.0030.0010.0030.0810.0000.0001.0000.1190.123
street0.159-0.001-0.0000.000-0.003-0.1260.1191.0000.002
zip_code-0.0720.0080.0040.0650.0110.1280.1230.0021.000

Missing values

2024-09-02T17:19:18.620068image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
A simple visualization of nullity by column.
2024-09-02T17:19:19.657259image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-09-02T17:19:22.653033image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

brokered_bystatuspricebedbathacre_lotstreetcitystatezip_codehouse_sizeprev_sold_date
0103378.0for_sale105000.03.02.00.121962661.0AdjuntasPuerto Rico601.0920.0NaN
152707.0for_sale80000.04.02.00.081902874.0AdjuntasPuerto Rico601.01527.0NaN
2103379.0for_sale67000.02.01.00.151404990.0Juana DiazPuerto Rico795.0748.0NaN
331239.0for_sale145000.04.02.00.101947675.0PoncePuerto Rico731.01800.0NaN
434632.0for_sale65000.06.02.00.05331151.0MayaguezPuerto Rico680.0NaNNaN
5103378.0for_sale179000.04.03.00.461850806.0San SebastianPuerto Rico612.02520.0NaN
61205.0for_sale50000.03.01.00.201298094.0CialesPuerto Rico639.02040.0NaN
750739.0for_sale71600.03.02.00.081048466.0PoncePuerto Rico731.01050.0NaN
881909.0for_sale100000.02.01.00.09734904.0PoncePuerto Rico730.01092.0NaN
965672.0for_sale300000.05.03.07.461946226.0Las MariasPuerto Rico670.05403.0NaN
brokered_bystatuspricebedbathacre_lotstreetcitystatezip_codehouse_sizeprev_sold_date
2226372108243.0sold425000.03.03.00.06970797.0RichlandWashington99354.01876.02022-02-14
222637316235.0sold305000.04.02.00.42353937.0RichlandWashington99354.02000.02022-02-11
222637453860.0sold310000.03.01.00.21500240.0RichlandWashington99354.01152.02022-02-11
222637560631.0sold385000.04.02.00.21210890.0RichlandWashington99354.01656.02022-03-28
222637685499.0sold339900.04.02.00.2041160.0RichlandWashington99354.02780.02022-03-28
222637723009.0sold359900.04.02.00.33353094.0RichlandWashington99354.03600.02022-03-25
222637818208.0sold350000.03.02.00.101062149.0RichlandWashington99354.01616.02022-03-25
222637976856.0sold440000.06.03.00.50405677.0RichlandWashington99354.03200.02022-03-24
222638053618.0sold179900.02.01.00.09761379.0RichlandWashington99354.0933.02022-03-24
2226381108243.0sold580000.05.03.00.31307704.0RichlandWashington99354.03615.02022-03-23